Consider the
pigsseries — the number of pigs slaughtered in Victoria each month.
- Use the
sesfunction in R to find the optimal values of \(\alpha\) and \(\ell_0\), and generate forecasts for the next four months.
##
## Forecast method: Simple exponential smoothing
##
## Model Information:
## Simple exponential smoothing
##
## Call:
## ses(y = pigs, h = 4)
##
## Smoothing parameters:
## alpha = 0.2971
##
## Initial states:
## l = 77260.0561
##
## sigma: 10309
##
## AIC AICc BIC
## 4463.0 4463.1 4472.7
##
## Error measures:
## ME RMSE MAE MPE MAPE MASE ACF1
## Training set 385.87 10254 7961.4 -0.92265 9.274 0.79662 0.012822
##
## Forecasts:
## Point Forecast Lo 80 Hi 80 Lo 95 Hi 95
## Sep 1995 98816 85605 112027 78612 119021
## Oct 1995 98816 85035 112598 77739 119894
## Nov 1995 98816 84486 113146 76900 120732
## Dec 1995 98816 83958 113674 76093 121540
- Compute a 95% prediction interval for the first forecast using \(\hat{y} \pm 1.96s\) where \(s\) is the standard deviation of the residuals. Compare your interval with the interval produced by R.
sd.ses <- sqrt(var(residuals(fcast.ses)))
f1 <- head(fcast.ses[["mean"]],1)
c(Lower95 = f1 - 1.96 * sd.ses,
Upper95 = f1 + 1.96 * sd.ses)## Lower95 Upper95
## 78680 118953
The intervals are close but not identical. This is because R estimates the variance of the residuals differently, taking account of the degrees of freedom properly.
sd.ses <- sqrt(sum(residuals(fcast.ses)^2)/(length(pigs)-2))
c(Lower95 = f1 - 1.96 * sd.ses,
Upper95 = f1 + 1.96 * sd.ses)## Lower95 Upper95
## 78612 119021
Write your own function to implement simple exponential smoothing. The function should take arguments
y(the time series),alpha(the smoothing parameter \(\alpha\)) andlevel(the initial level \(\ell_0\)). It should return the forecast of the next observation in the series. Does it give the same forecast asses?
myses <- function(y, alpha, level)
{
n <- length(y)
yhat <- numeric(n+1)
yhat[1] <- level
for(i in 2:(n+1)) {
yhat[i] <- alpha * y[i-1] + (1 - alpha) * yhat[i-1]
}
return(tail(yhat,1))
}
c(my_forecast=myses(pigs, 0.2971, 77260.0561),
R_forecast=ses(pigs, h = 1)[["mean"]])## my_forecast R_forecast
## 98816 98816
Modify your function from Q2 to return the sum of squared errors rather than the forecast of the next observation. Then use the
optimfunction to find the optimal values of \(\alpha\) and \(\ell_0\). Do you get the same values as thesesfunction?
myses.sse <- function(par, y)
{
alpha <- par[1]
level <- par[2]
n <- length(y)
yhat <- numeric(n)
yhat[1] <- level
for(i in 2:n) {
yhat[i] <- alpha * y[i-1] + (1 - alpha) * yhat[i-1]
}
return(sum((y - yhat)^2))
}
options(digits=8)
optim(c(0.1, pigs[1]), myses.sse, y=pigs)$par## [1] 2.9711897e-01 7.7265875e+04
## alpha l
## 2.9714883e-01 7.7260056e+04
Similar, but not identical estimates. This is due to different starting values being used.
myses.sse <- function(par, y)
{
alpha <- par[1]
level <- par[2]
n <- length(y)
yhat <- numeric(n)
yhat[1] <- level
for(i in 2:n) {
yhat[i] <- alpha * y[i-1] + (1 - alpha) * yhat[i-1]
}
return(sum((y - yhat)^2))
}
myses <- function(y, h=10)
{
par <- optim(c(0.1, y[1]), myses.sse, y=y)$par
alpha <- par[1]
level <- par[2]
n <- length(y)
yhat <- numeric(n+1)
yhat[1] <- level
for(i in 2:(n+1)) {
yhat[i] <- alpha * y[i-1] + (1 - alpha) * yhat[i-1]
}
return(rep(yhat[n+1], h))
}
# Usage:
myses(pigs)## [1] 98816 98816 98816 98816 98816 98816 98816 98816 98816 98816
## Point Forecast Lo 80 Hi 80 Lo 95 Hi 95
## Sep 1995 98816 85605 112027 78612 119021
## Oct 1995 98816 85035 112598 77739 119894
## Nov 1995 98816 84486 113146 76900 120732
## Dec 1995 98816 83958 113674 76093 121540
## Jan 1996 98816 83449 114184 75313 122320
## Feb 1996 98816 82955 114678 74559 123074
## Mar 1996 98816 82476 115156 73827 123806
## Apr 1996 98816 82012 115621 73116 124517
## May 1996 98816 81559 116074 72424 125209
## Jun 1996 98816 81118 116515 71749 125883
Data set
bookscontains the daily sales of paperback and hardcover books at the same store. The task is to forecast the next four days’ sales for paperback and hardcover books.
- Plot the series and discuss the main features of the data.
Both series are trended, with a similar slope. There is no obvious weekly seasonality.
- Use the
sesfunction to forecast each series, and plot the forecasts.
fcast1 <- ses(books[,"Hardcover"], h=4)
fcast2 <- ses(books[,"Paperback"], h=4)
autoplot(books) +
autolayer(fcast1, series="Hardcover", PI=FALSE) +
autolayer(fcast2, series="Paperback", PI=FALSE)
- Compute the RMSE values for the training data in each case.
## ME RMSE MAE MPE MAPE MASE ACF1
## Training set 9.1667 31.931 26.773 2.6362 13.395 0.79879 -0.14178
## ME RMSE MAE MPE MAPE MASE ACF1
## Training set 7.176 33.638 27.843 0.47361 15.578 0.70213 -0.21175
- Now apply Holt’s linear method to the
paperbackandhardbackseries and compute four-day forecasts in each case.
fcast1 <- holt(books[,"Hardcover"], h=4)
fcast2 <- holt(books[,"Paperback"], h=4)
autoplot(books) +
autolayer(fcast1, series="Hardcover", PI=FALSE) +
autolayer(fcast2, series="Paperback", PI=FALSE)
- Compare the RMSE measures of Holt’s method for the two series to those of simple exponential smoothing in the previous question. (Remember that Holt’s method is using one more parameter than SES.) Discuss the merits of the two forecasting methods for these data sets.
## ME RMSE MAE MPE MAPE MASE ACF1
## Training set -0.13579 27.194 23.156 -2.1148 12.163 0.69086 -0.032452
## ME RMSE MAE MPE MAPE MASE ACF1
## Training set -3.7172 31.137 26.181 -5.5085 15.584 0.66021 -0.17508
There’s been a big redution in RMSE, even allowing for the extra parameter. That’s not surprising as these series are clearly trended.
- Compare the forecasts for the two series using both methods. Which do you think is best?
The results from holt are all higher than those from ses because it allows for the increasing trend.
Holt’s method is to be preferred here.
- Calculate a 95% prediction interval for the first forecast for each series, using the RMSE values and assuming normal errors. Compare your intervals with those produced using
sesandholt.
## Point Forecast Lo 80 Hi 80 Lo 95 Hi 95
## 31 250.17 212.74 287.61 192.92 307.43
## 32 253.48 216.04 290.91 196.22 310.73
## 33 256.78 219.34 294.21 199.53 314.03
## 34 260.08 222.65 297.52 202.83 317.33
## Low High
## 196.87 303.47
## Point Forecast Lo 80 Hi 80 Lo 95 Hi 95
## 31 209.47 166.60 252.33 143.91 275.02
## 32 210.72 167.85 253.58 145.16 276.27
## 33 211.97 169.11 254.83 146.41 277.52
## 34 213.22 170.36 256.08 147.67 278.77
## Low High
## 148.44 270.50
As with Exercise 1b, holt() is estimating the standard deviation of the results using a method that takes account of the degrees of freedom. So the results are slightly different.
For this exercise, use data set
eggs, the price of a dozen eggs in the United States from 1900–1993. Experiment with the various options in theholt()function to see how much the forecasts change with damped trend, or with a Box-Cox transformation. Try to develop an intuition of what each argument is doing to the forecasts.[Hint: use
h=100when callingholt()so you can clearly see the differences between the various options when plotting the forecasts.]Which model gives the best RMSE?
## ME RMSE MAE MPE MAPE MASE ACF1
## Training set 1.0926 26.382 19.042 -0.96935 9.6486 0.93933 0.03752
Recall your retail time series data (from Exercise 3 in Section 2.10).
- Why is multiplicative seasonality necessary for this series?
myts <- readxl::read_excel("data/retail.xlsx", skip=1)[,"A3349873A"] %>%
ts(frequency=12, start=c(1982,4))
autoplot(myts)The seasonal variation increases with the level of the series. So we need to use multiplicative seasonality.
- Apply Holt-Winters’ multiplicative method to the data. Experiment with making the trend damped.
- Compare the RMSE of the one-step forecasts from the two methods. Which do you prefer?
## ME RMSE MAE MPE MAPE MASE ACF1
## Training set 0.11706 13.294 8.9919 -0.12177 3.9184 0.47489 0.086356
## ME RMSE MAE MPE MAPE MASE ACF1
## Training set 1.4149 13.305 9.0422 0.6106 3.9596 0.47755 0.040779
There is not much difference between these models, but the non-damped one has slightly better training RMSE. Also, we would expect the trend to continue, so I would prefer to use the non-damped version.
- Check that the residuals from the best method look like white noise.
##
## Ljung-Box test
##
## data: Residuals from Holt-Winters' multiplicative method
## Q* = 40.4, df = 8, p-value = 2.7e-06
##
## Model df: 16. Total lags used: 24
There are significant correlations in the residuals, including at lags 1 and 2. So these residuals do not look like white noise.
- Now find the test set RMSE, while training the model to the end of 2010. Can you beat the seasonal naïve approach from Exercise 8 in Section 3.7)?
myts %>% window(end=c(2010,12)) %>%
hw(seasonal='multiplicative', damped=FALSE) %>%
accuracy(x=myts)## ME RMSE MAE MPE MAPE MASE ACF1
## Training set 0.030212 9.1074 6.5535 0.0019955 3.2934 0.41071 0.027529
## Test set 55.337714 70.1166 55.3377 15.1732078 15.1732 3.46798 0.352875
## Theil's U
## Training set NA
## Test set 1.305
The test set RMSE is 70.117 compared to 71.443 for the seasonal naïve method. So not a big difference, but Holt-Winters’ method is slightly better.
For the same retail data, try an STL decomposition applied to the Box-Cox transformed series, followed by ETS on the seasonally adjusted data. How does that compare with your best previous forecasts on the test set?
## ME RMSE MAE MPE MAPE MASE ACF1
## Training set -0.63743 8.131 5.8265 -0.29898 2.8602 0.36514 0.035926
## Test set 58.71961 73.704 58.7196 16.11340 16.1134 3.67992 0.393535
## Theil's U
## Training set NA
## Test set 1.368
That is worse (73.704 compared to 70.117).
For this exercise, use the quarterly UK passenger vehicle production data from 1977Q1–2005Q1 (data set
ukcars).
- Plot the data and describe the main features of the series.
- Decompose the series using STL and obtain the seasonally adjusted data.
- Forecast the next two years of the series using an additive damped trend method applied to the seasonally adjusted data. (This can be done in one step using
stlfwith argumentsetsmodel="AAN", damped=TRUE.)
fcastHoltDamp <- stlf(ukcars, etsmodel="AAN", damped=TRUE)
autoplot(ukcars) +
autolayer(fcastHoltDamp, PI=FALSE)
- Forecast the next two years of the series using Holt’s linear method applied to the seasonally adjusted data (as before but with
damped=FALSE).
fcastHolt <- stlf(ukcars, etsmodel="AAN", damped=FALSE)
autoplot(ukcars) +
autolayer(fcastHolt, PI=FALSE)
- Now use
ets()to choose a seasonal model for the data.
- Compare the RMSE of the ETS model with the RMSE of the models you obtained using STL decompositions. Which gives the better in-sample fits?
## ME RMSE MAE MPE MAPE MASE ACF1
## Training set 1.5513 23.321 18.49 0.04122 6.0428 0.60258 0.022627
## ME RMSE MAE MPE MAPE MASE ACF1
## Training set -0.34127 23.295 18.16 -0.59708 5.9802 0.59184 0.021036
## ME RMSE MAE MPE MAPE MASE ACF1
## Training set 1.3139 25.232 20.179 -0.1571 6.629 0.65763 0.025733
Holt’s method does slightly better in-sample.
- Compare the forecasts from the three approaches? Which seems most reasonable?
autoplot(ukcars) +
autolayer(fcastHoltDamp, PI=FALSE, series="Damped Holt's") +
autolayer(fcastHolt, PI=FALSE, series="Holt's") +
autolayer(forecast(ukcarsets), PI=FALSE, series="ETS")The forecasts are almost identical. So I’ll use the Holt’s method which has smallest RMSE.
- Check the residuals of your preferred model.
## Warning in checkresiduals(fcastHolt): The fitted degrees of freedom is based on
## the model used for the seasonally adjusted data.
##
## Ljung-Box test
##
## data: Residuals from STL + ETS(A,A,N)
## Q* = 22.1, df = 4, p-value = 0.00019
##
## Model df: 4. Total lags used: 8
There is some significant autocorrelation at lags 4, 8, 11, 12 and 18. The ETS model is slightly better, although still significant.
##
## Ljung-Box test
##
## data: Residuals from ETS(A,N,A)
## Q* = 18.3, df = 3, p-value = 0.00038
##
## Model df: 6. Total lags used: 9
I’d probably go with the ETS model here due to its slightly better residual properties.
For this exercise, use the monthly Australian short-term overseas visitors data, May 1985–April 2005. (Data set:
visitors.)
- Make a time plot of your data and describe the main features of the series.
- Split your data into a training set and a test set comprising the last two years of available data. Forecast the test set using Holt-Winters’ multiplicative method.
train <- window(visitors, end=end(visitors)-c(2,0))
fcast <- hw(train, h=24, seasonal="multiplicative")
autoplot(fcast) +
autolayer(visitors)
- Why is multiplicative seasonality necessary here?
The multiplicative seasonality is important in this example because seasonal pattern increases in size proportionally to the level of the trend.
- Forecast the two-year test set using each of the following methods:
- an ETS model;
- an additive ETS model applied to a Box-Cox transformed series;
- a seasonal naïve method;
- an STL decomposition applied to the Box-Cox transformed data followed by an ETS model applied to the seasonally adjusted (transformed) data.
f1 <- forecast(ets(train))
f2 <- forecast(ets(train, lambda=0))
f3 <- snaive(train)
f4 <- stlf(train, lambda=0)
autoplot(visitors) +
autolayer(f1, PI=FALSE, series="ETS") +
autolayer(f2, PI=FALSE, series="Additive ETS with Box-Cox") +
autolayer(f3, PI=FALSE, series="Seasonal naïve") +
autolayer(f4, PI=FALSE, series="STL+ETS with Box-Cox")
- Which method gives the best forecasts? Does it pass the residual tests?
## ME RMSE MAE MPE MAPE MASE ACF1
## Training set 0.76401 14.535 10.577 0.10482 3.9948 0.40542 -0.053112
## Test set 72.19927 80.231 74.553 15.92028 16.8224 2.85777 0.587170
## Theil's U
## Training set NA
## Test set 1.1273
## ME RMSE MAE MPE MAPE MASE ACF1
## Training set 0.90833 14.409 10.556 0.24231 3.9998 0.40463 -0.040151
## Test set 72.73026 80.669 75.035 16.04152 16.9249 2.87625 0.583589
## Theil's U
## Training set NA
## Test set 1.1346
## ME RMSE MAE MPE MAPE MASE ACF1 Theil's U
## Training set 17.294 31.156 26.088 7.1924 10.2860 1.0000 0.63277 NA
## Test set 32.871 50.301 42.246 6.6408 9.9626 1.6194 0.57254 0.6594
## ME RMSE MAE MPE MAPE MASE ACF1
## Training set 0.49277 13.082 9.2752 0.021424 3.4803 0.35554 -0.076999
## Test set 73.98213 82.362 75.9447 16.266959 17.0192 2.91112 0.655312
## Theil's U
## Training set NA
## Test set 1.1475
The best method is seasonal naïve, although it fails the residuals tests:
##
## Ljung-Box test
##
## data: Residuals from Seasonal naive method
## Q* = 295, df = 24, p-value <2e-16
##
## Model df: 0. Total lags used: 24
- Compare the same four methods using time series cross-validation with the
tsCVfunction instead of using a training and test set. Do you come to the same conclusions?
e <- matrix(NA,ncol=4,nrow=length(visitors))
f1 <- function(y,h){forecast(ets(y),h=h)}
e[,1] <- tsCV(visitors, f1)
f2 <- function(y,h){forecast(ets(y, lambda=0),h=h)}
e[,2] <- tsCV(visitors, f2)
e[,3] <- tsCV(visitors, snaive)
e[,4] <- tsCV(visitors, stlf, lambda=0)
colMeans(e^2, na.rm=TRUE)## [1] 343.36 533.88 1074.97 294.13
Now the STLF method appears better (based on 1-step forecasts), even though it was worst on the test set earlier.
The fets() function below returns ETS forecasts.
tsCV() for a forecast horizon of \(h=4\), for both ETS and seasonal naïve methods to the qcement data, (Hint: use the newly created fets() and the existing snaive() functions as your forecast function arguments.)## h=1 h=2 h=3 h=4
## 0.0069595 0.0105923 0.0141177 0.0184511
## h=1 h=2 h=3 h=4
## 0.017792 0.017828 0.017964 0.018106
The ETS results are much better for the first 3 horizons, but the snaive results are slightly better for \(h=4\). With a long series like this, I would expect ETS to do better as it should have no trouble estimating the parameters, and it will include trends if required.
Compare
ets,snaiveandstlfon the following six time series. Forstlf, you might need to use a Box-Cox transformation. Use a test set of three years to decide what gives the best forecasts.ausbeer,bricksq,dole,a10,h02,usmelec.
A Box-Cox transformation does not seem to be required for this series.
train <- window(ausbeer, end=end(ausbeer)-c(3,0))
fc1 <- forecast(ets(train))
fc2 <- snaive(train)
fc3 <- stlf(train)
accuracy(fc1, ausbeer)## ME RMSE MAE MPE MAPE MASE ACF1
## Training set -0.34661 15.7820 11.9800 -0.064073 2.8643 0.75671 -0.194228
## Test set -0.57976 8.8393 8.2509 -0.106162 1.9677 0.52116 0.074261
## Theil's U
## Training set NA
## Test set 0.15694
## ME RMSE MAE MPE MAPE MASE ACF1 Theil's U
## Training set 3.3366 19.670 15.832 0.90440 3.7714 1.00000 0.00096908 NA
## Test set -3.0000 10.874 9.750 -0.62339 2.3192 0.61585 0.06636156 0.19757
## ME RMSE MAE MPE MAPE MASE ACF1
## Training set 0.64492 13.7795 10.6415 0.16432 2.5526 0.67216 -0.146951
## Test set -3.82948 9.4617 7.9709 -0.86682 1.9048 0.50348 0.030773
## Theil's U
## Training set NA
## Test set 0.15847
Here, ets does best (based on RMSE) over the three year training set.
The other series are handled similarly.
Use
ets()on some of these series:bicoal,chicken,dole,usdeaths,lynx,ibmclose,eggs.Does it always give good forecasts?
I’ve used a log transformation here to prevent the forecast intervals going negative
Again, a Box-Cox transformation is necessary. The point forecasts look too low, and the forecast intervals are far too wide. It might be necessary to only model the last few years of the data.
That looks a little more reasonable, although now the intervals are probably too narrow.
Here the cyclic behaviour of the lynx data is completely lost. ETS models are not designed to handle cyclic data, so there is nothing that can be done to improve this.
- Find an example where it does not work well. Can you figure out why?
See comments on lynx above.
Show that the point forecasts from an ETS(M,A,M) model are the same as those obtained using multiplicative Holt-Winters’ method.
Point forecasts from the multiplicative Holt-Winters’ method: \[ \hat{y}_{t+h|t} = (\ell_t + hb_t)s_{t-m+h_m^+} \]
An ETS(M,A,M) model is given by \[\begin{align*} y_t & = (\ell_{t-1}+b_{t-1})s_{t-m}(1+\varepsilon_t) \\ \ell_t & = (\ell_{t-1}+b_{t-1})(1+\alpha\varepsilon_t) \\ b_t & = b_{t-1} + \beta(\ell_{t-1}+b_{t-1})\varepsilon_t \\ s_t & = s_{t-m} (1+\gamma\varepsilon_t) \end{align*}\] So \(y_{T+h}\) is given by \[ y_{T+h} = (\ell_{T+h-1}+b_{T+h-1})s_{T+h-m}(1+\varepsilon_{T+h-m}) \] Replacing \(\varepsilon_{t}\) by zero for \(t>T\), and substituting in from the above equations, we obtain \[ \hat{y}_{T+h} = (\ell_{T+h-2}+2b_{T+h-2})s_{T+h-m} \] Repeating the process a few times leads to \[ \hat{y}_{T+h} = (\ell_{T}+hb_{T})s_{T+h-m} \] Doing the same thing for \(s_{T+h-m}\) gives \[ \hat{y}_{T+h|T} = (\ell_{T}+hb_{T})s_{T+h_m^+-m} \] as required.
Show that the forecast variance for an ETS(A,N,N) model is given by \[ \sigma^2\left[1+\alpha^2(h-1)\right]. \]
An ETS(A,N,N) model is defined as \[\begin{align*} y_t & = \ell_{t-1} + \varepsilon_{t} \\ \ell_{t} & = \ell_{t-1} + \alpha\varepsilon_{t}, \end{align*}\] where \(\varepsilon_t \sim \text{N}(0,\sigma^2)\), and \(h\)-step forecasts are given by \[ \hat{y}_{T+h|T} = \ell_T. \] So \[\begin{align*} y_{T+h} & = \ell_{T+h-1} + \varepsilon_{T+h} \\ & = \ell_{T+h-2} + \alpha \varepsilon_{T+h-1} + \varepsilon_{T+h} \\ & = \ell_{T+h-3} + \alpha \varepsilon_{T+h-2} + \alpha \varepsilon_{T+h-1} + \varepsilon_{T+h} \\ & \dots \\ & = \ell_{T} + \alpha \sum_{j=1}^{h-1} \varepsilon_{T+h-j} + \varepsilon_{T+h}. \end{align*}\] Therefore \[\begin{align*} \text{Var}(y_{T+h} | y_1,\dots,y_T) & = \alpha^2 \sum_{j=1}^{h-1} \sigma^2 + \sigma^2 \\ & = \sigma^2\left[ 1 + \alpha^2 (h-1)\right ]. \end{align*}\]
Write down the 95% prediction intervals for an ETS(A,N,N) model as a function of \(\ell_T\), \(\alpha\), \(h\) and \(\sigma\), assuming Gaussian errors.
Using previous result: \[ \ell_T \pm 1.96 \sigma \sqrt{ 1 + \alpha^2 (h-1)} \]